HyML - XML / (X)HTML generator for Hy

Motivation

Previous similar work

Installation

My environment for the sake of clarity:


In [1]:
(import hy sys)
(print "Hy version: " hy.__version__)
(print "Python" sys.version)


Hy version:  0.13.0
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]

Import main macros


In [2]:
(require [hyml.macros [*]]
         [hyml.helpers [*]])
(import (hyml.macros (*)))
(import (hyml.helpers (indent)))

Then we are ready for the show!

Almost all-in-one example

First I'd like to show an example that uses most features included in the HyML module. Then I will go thru all presented features case by case.


In [3]:
; by default there is no indentation, thus for pretty print we use indent
(print (indent 
  ; specify parser macro (ML macros) that must be one of the following:
  ; xml, xhtml, xhtml5, html4, or html5 
  (xhtml5
  ; plain text content
  ; xml declaration below could also be done with a custom tag: (?xml :version "1.0" :encoding "UTF-8")
  "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
  ; more plain text content
  ; doctype could also be done with a custom tag: (!DOCTYPE "html")
  "<!DOCTYPE html>"
  ; define tag name as the first parameter
  ; define attributes by keywords
  (html :lang "en" :xmlns "http://www.w3.org/1999/xhtml"
    ; define nested tags and content by similar manner
    (head
      ; everything else except the first parameter and keywords are
      ; regarded as inner html content
      (title "Page title"))
    (body
      ; plain text content
      ; comments could also be done with a custom tag: (!-- "comments")
      "<!-- body starts here -->"
      ; short notation for div element and class attribute <div class=""/>
      ; note that - character in main-container will become to main_container due to Hy
      ; internal language construction
      (.main-container
         ; short notation for class attribute for specified element: <h1 class=""/>
         ; with multiple dot notation classes are concatenated with space
         (h1.main.header
           ; unquote macro with ~ to evaluate normal Hy code
           ; after unquoted expression rest of the code is continued to be parsed by ML macros again
           ~(.capitalize "page header"))
         ; short notation for id attribute for specified element: <ul id=""/>
         ; you should not use joined #main#sub similar to class notation, althought it is not prohibited,
         ; because id="main sub" is not a good id according to html attribute specifications
         (ul#main "List"
           ; unquote splice ~@ processes lists and concatenates results
           ; list-comp* is a slightly modified vesion of list-comp
           ; in list-comp* the list argument is the first and the expression is
           ; the second argument. in native list-comp those arguments are in reverse order
           ~@(list-comp* [[idx num] (enumerate (range 3))]
                         ; quote (`) a line and unquote variables and expressions to calculate
                         ; and set correct class for even and odd list items
                         `(li :class ~(if (even? idx) "even" "odd") ~num)))))))))


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<title>Page title</title>
	</head>
	<body>
		<!-- body starts here -->
		<div class="main_container">
			<h1 class="main header">Page header</h1>
			<ul id="main">
				List
				<li class="even">0</li>
				<li class="odd">1</li>
				<li class="even">2</li>
			</ul>
		</div>
	</body>
</html>

XML, HTML4, HTML5, XHTML, and XHTML5

At the moment HyML module contains xml, html4, html5, xhtml, and xhtml5 macros (called as ML macros in short) to generate the (M)arkup (L)anguage code. xml is a generic generator which allows using any tag names and attributes. html4 and xhtml macros allows to use only html4 specified tag names. Same applies to html5 and xhtml5. Complete chart of the allowed elements are listed at the end of the document.

Tags can be created with or without attributes, as well as with or without content. For example:


In [4]:
(println
  (xml (node))
  (xml (node :attribute "")) ; force to use empty attribute
  (xml (node :attribute "value"))
  (xml (node :attribute "value" "")) ; force to use empty content
  (xml (node :attribute "value" "Content")))


<node/>
<node attribute=""/>
<node attribute="value"/>
<node attribute="value"></node>
<node attribute="value">Content</node>

However in html4 and html5 there are certain tags that cannot have endings so they will be rendered in correct form by the parser. "Forbidden" labeled tags are listed at the end of the document. One of them is for example the meta tag:


In [5]:
(html4 (meta :name "keywords" :content "HTML,CSS,XML,JavaScript"))


Out[5]:
'<meta name=keywords content=HTML,CSS,XML,JavaScript>'

To see and compare the difference in xhtml, let macro print the same:


In [6]:
(xhtml (meta :name "keywords" :content "HTML,CSS,XML,JavaScript"))


Out[6]:
'<meta name="keywords" content="HTML,CSS,XML,JavaScript"/>'

Shorthand macro

#㎖ (Square Ml) can be used as a shorthand reader macro for generating xml/html/xhtml code:


In [7]:
#㎖(html
    (head (title "Page title"))
    (body (div "Page content" :class "container")))


Out[7]:
'<html><head><title>Page title</title></head><body><div class="container">Page content</div></body></html>'

#㎖ actually utilizes xml macro so same result can be achieved with the next, maybe more convenient and recommended notation:


In [8]:
(xml
  (html
    (head (title "Page title"))
    (body (div "Page content" :class "container"))))


Out[8]:
'<html><head><title>Page title</title></head><body><div class="container">Page content</div></body></html>'

It is not possible to define other ML macro to be used with the #㎖ shorthand reader macro. You could however define your own shorthands following next quidelines:

(defsharp {unicode-char} [code] (parse-{parser} code))

{unicode-char} can be any unicode char you want. {parser} must be one of the following available parsers: xml, xhtml, xhtml5, html4, or html5.

With #㎖ shorthand you have to provide a single root node for generating code. Directry using ML macros makes it possible to generate multiple instances of code, and might be more informative notation style anyway:


In [9]:
(xml (p "Sentence 1") (p "Sentence 2") (p "Sentence 3"))


Out[9]:
'<p>Sentence 1</p><p>Sentence 2</p><p>Sentence 3</p>'

Let us then render the code, not just printing it. This can be done via html5> macro imported earlier from helpers:


In [10]:
(html4> "Content is " (b king) !)


Out[10]:
Content is king!

Renderers are available for all ML macros: xml>, xhtml>, xhtml5>, html4>, and html5>.

Validation and minimizing

If validation of the html tag names is a concern, then one should use html4, html5, xhtml, and xhtml5 macro family. In the example below if we try to use time element in html4, which is specifically available in html5 only, we will get an HyTMLError exception:


In [11]:
;(try
; (html4 (time))
; (catch [e [HyTMLError]]))
;hytml.macros.HyTMLError: Tag 'time' not meeting html4 specs

Other features in html4 and html5 macros are attribute and tag minimizing. Under the certain rules start and end tags can be removed from the output. Also boolean attributes can be shortened. In html4 and html5 macros minimizing is a default feature that can't be bypassed. If you do not want to minimize code, you must use xhtml or xhtml5 macro. Contrary in xhtml and xhtml5 macros attribute and tag minimizing is NOT available. Instead all tags are strictly closed and attributes in key="value" format.


In [12]:
; valid html4 document
(html4 (title) (table (tr (td "Cell 1") (td "Cell 2") (td "Cell 3"))))


Out[12]:
'<title/><table><tr><td>Cell 1<td>Cell 2<td>Cell 3</table>'

In [13]:
; in xhtml tags and attributes will be output in complete format
(xhtml (title) (table (tr (td "Cell 1") (td "Cell 2") (td "Cell 3"))))


Out[13]:
'<title/><table><tr><td>Cell 1</td><td>Cell 2</td><td>Cell 3</td></tr></table>'
Note that above xhtml code is still not a valid xhtml document even tags and attributes are perfectly output. `ML` macros do no validate structure of the document just tag names. For validation one should use official [validator](https://validator.w3.org/) service and follow the html [specifications](https://w3c.github.io/html/) to create a valid document. `ML` macros can be used to guide on that process but more importantly it is meant to automatize the generation of the xml code while adding programming capabilities on it.
`xml` on the other hand doesn't give a dime of the used tag names. They can be anything, even processed names. Same applies to keywords, values, and contents. You should use more strict `xhtml`, `xhtml5`, `html4`, and `html5` macros to make sure that tag names are corresponding to HTML4 or HTML5 specifications.

In [14]:
; see how boolean attribute minimizing works
(html4 (input :disabled "disabled"))


Out[14]:
'<input disabled>'

Unquoting code

In all ML macros you can pass any code in it. See for example:


In [15]:
(xml (p "Sum: " (b (apply sum [[1 2 3 4]]))))


Out[15]:
'<p>Sum: <b><apply>sum<[1, 2, 3, 4]/></apply></b></p>'

But you see, the result was not possibly what you expected. ML macros will interpret the first item of the expression as a name of the tag. Thus apply becomes a tag name. Until the next expression everything else is interpreted either as a content or a keyword.

However using ~ (unquote) symbol, ML macro behaviour can be stopped for a moment:


In [16]:
(xml (p "Sum: " (b ~(apply sum [[1 2 3 4]])) !))


Out[16]:
'<p>Sum: <b>10</b>!</p>'

So the following expression after ~ will be evaluated and then result is returned back to the original parser. And the rest of the code will be interpreted via macro. In this case it was just an exclamation mark.

Note that it is not mandatory to wrap strings with `""` if given input doesn't contain any spaces. You could also single quote simple non-spaced letter sequences. So `!` is same as `"!"` in this case.

Quoting and executing normal Hy code inside html gives almost unlimited possibility to use HyML as a templating engine. Of cource there is also a risk to evaluate code that breaks the code execution. Plus uncontrolled template engine code may be a security consern.

Unquote splice

In addition to unquote, one can handle lists and iterators with ~@ (unquote-splice) symbol. This is particularly useful when a list of html elements needs to be passed to the parent element. Take for example this table head generation snippet:


In [17]:
(xhtml 
 (table (thead
   (tr ~@(list-comp
         `(th :class (if (even? ~i) "even" "odd") ~label " " ~i)
         [[i label] (enumerate (* ["col"] 3))])))))


Out[17]:
'<table><thead><tr><th class="even">col 0</th><th class="odd">col 1</th><th class="even">col 2</th></tr></thead></table>'

List comprehensions notation might seem a little bit strange for some people. It takes a processing part (expression) as the first argument, and the actual list to be processed as the second argument. On a nested code this will move lists to be processed in first hand to the end of the notation. For example:


In [18]:
(xml> 
  ~@(list-comp `(ul (b "List")
      ~@(list-comp `(li item " " ~li)
          [li uls]))
    [uls [[1 2] [1 2]]]))


Out[18]:
    List
  • item 1
  • item 2
    List
  • item 1
  • item 2

But there is another slighly modified macro to use in similar manner:

list-comp*

Let's do again above example but this time with a dedicated list-comp* macro. Now the lists to be processed is passed as the first argument to the list-comp* macro and the expression for processing list items is the second argument. Yet the second argument itself contains a new list processing loop until final list item is to be processed. This is perhaps easier to follow for some people:


In [19]:
(xhtml
  ~@(list-comp* [uls [[1 2] [1 2]]]
    `(ul (b "List")
      ~@(list-comp* [li uls]
        `(li item " " ~li)))))


Out[19]:
'<ul><b>List</b><li>item 1</li><li>item 2</li></ul><ul><b>List</b><li>item 1</li><li>item 2</li></ul>'

Of cource it is just a matter of the taste which one you like. list-comp* with unquote-splice symbol (~@) reminds us that it is possible to create any similar custom macros for the HyML processor. ~@ can be thought as a macro caller, not just unquoting and executing Hy code in a normal lisp mode.

Here is a more complex table generation example from the remarkuple Python module docs. One should notice how variables (col, row, and cell) are referenced by quoting them:


In [20]:
(html4>
  (table#data
    (caption "Data table")
    (colgroup
      (col :style "background-color:red")
      (col :style "background-color: green")
      (col :style "background-color: blue"))
    (thead
      (tr
        ~@(list-comp* [col ["Column 1" "Column 2" "Column 3"]]
          `(th ~col))))
    (tbody#tbody1
     ~@(list-comp* [row (range 1 3)]
       `(tr
         ~@(list-comp* [cell (range 3)]
           `(td  ~row "." ~cell)))))
    (tbody#tbody2
     ~@(list-comp* [row (range 1 3)]
       `(tr
         ~@(list-comp* [cell (range 3)]
           `(td  ~row "." ~cell)))))
    (tfoot 
      (tr
        (td :colspan "3" "Footer")))))


Out[20]:
Data table
Column 1Column 2Column 3
1.01.11.2
2.02.12.2
1.01.11.2
2.02.12.2
Footer

Address book table from CSV file

We should of course be able to use external source for the html. Let's try with a short csv file:


In [21]:
(xhtml> 
 (table.data
   (caption "Contacts")
   ~@(list-comp*
     [[idx row] (enumerate (.split (.read (open "data.csv" "r")) "\n"))]
     (if (pos? idx) 
         `(tbody
            ~@(list-comp* [item (.split row ",")]
              `(td ~item)))
         `(thead
            ~@(list-comp* [item (.split row ",")]
              `(th ~item)))))))


Out[21]:
Contacts
TitleNamePhone
Mr.John07868785831
MissLinda0141-2244-5566
MasterJack0142-1212-1234
Mr.Bush911-911-911

Templates

It is possible to load code from an external file too. This feature has not been deeply implemented yet, but you get the feeling by the next example. Firt I'm just going to show external template file content:


In [22]:
(with [f (open "templates/template.hy")] (print (f.read)))


(html :lang ~lang
  (head (title ~title))
  (body
  	(p ~body)))

Then I use include macro to read and process the content:


In [23]:
(defvar lang "en"
        title "Page title"
        body "Content")

(xhtml ~@(include "templates/template.hy"))


Out[23]:
'<html lang="en"><head><title>Page title</title></head><body><p>Content</p></body></html>'

All globally defined variables are available on ML macros likewise:


In [24]:
(xhtml ~lang ", " ~title ", " ~body)


Out[24]:
'en, Page title, Content'

HTML4 / 5 specifications

xml does not care about the markup specifications other than general tag and attribute notation. It is totally dummy about the naming conventions of the tags or their relation to each other or global structure of the markup document. It is all on the responsibility of the user to make it correct.

html4 and html5 macros will render tags as specified below. These macros will minimize code when possible. Using undefined tag will raise an error. Attributes are not validated however. One should use official validator for a proper validation.

Below is the last example of using ML macros. It will print the first 5 rows of the HTML4/5 specifications.

Columns are:

  • Tag name
  • Tag title
  • Forbidden (if there should be no content or end tag)
  • Omit (forbidden plus omit short tag like <col>)
  • HTML4 (is html4 compatible?)
  • HTML5 (is html5 compatible?)

In [25]:
(xhtml>
  (table.data
    (caption "HTML Element Specifications")
    (thead
      (tr
        ~@(list-comp* [col ["Tag name" "Tag title" "Forbidden" "Omit" "HTML4" "HTML5"]]
          `(th ~col))))
    (tbody 
     ~@(list-comp* [[id row] (take 5 (.items (do (import (hyml.macros (specs))) specs)))]
       (do
        `(tr
          (td ~(.upper (get row :name)))
          (td ~(get row :name))
          (td ~(get row :forbidden))
          (td ~(get row :omit))
          (td ~(get row :html4) :class (if ~(get row :html4) "html4" ""))
          (td :class (if ~(get row :html5) "html5" ""))))))))


Out[25]:
HTML Element Specifications
Tag nameTag titleForbiddenOmitHTML4HTML5
AaFalseFalseTrue
ABBRabbrFalseFalseTrue
ACRONYMacronymFalseFalseTrue
ADDRESSaddressFalseFalseTrue
APPLETappletFalseFalseTrue

In [26]:
; lets import pandas dataframe for easy table view
(import [pandas])
; set max rows to 200 to prevent pruning displayed rows
(pandas.set_option "display.max_rows" 200)
; disable jupyter notebook autoscroll on the next cell

In [27]:
%javascript IPython.OutputArea.prototype._should_scroll = function(lines) {return false}



In [28]:
; show all specs
(pandas.DataFrame.transpose (pandas.DataFrame specs))


Out[28]:
﷐:forbidden ﷐:html4 ﷐:html5 ﷐:name ﷐:omit ﷐:title
﷐:a False True True a False Anchor
﷐:abbr False True True abbr False Abbreviation
﷐:acronym False True False acronym False Acronym
﷐:address False True True address False Address
﷐:applet False True False applet False Java applet
﷐:area True True True area True Image map region
﷐:article False False True article False Defines an article
﷐:aside False False True aside False Defines content aside from the page content
﷐:audio False False True audio False Defines sound content
﷐:b False True True b False Bold text
﷐:base True True True base True Document base URI
﷐:basefont True True False basefont False Base font change
﷐:bdi False False True bdi False Isolates a part of text that might be formatte...
﷐:bdo False True True bdo False BiDi override
﷐:big False True False big False Large text
﷐:blockquote False True True blockquote False Block quotation
﷐:body False True True body False Document body
﷐:br True True True br True Line break
﷐:button False True True button False Button
﷐:canvas False False True canvas False Used to draw graphics, on the fly, via scripti...
﷐:caption False True True caption False Table caption
﷐:center False True False center False Centered block
﷐:cite False True True cite False Citation
﷐:code False True True code False Computer code
﷐:col True True True col True Table column
﷐:colgroup False True True colgroup False Table column group
﷐:datalist False False True datalist False Specifies a list of pre-defined options for in...
﷐:dd False True True dd False Definition description
﷐:del False True True del False Deleted text
﷐:details False False True details False Defines additional details that the user can v...
﷐:dfn False True True dfn False Defined term
﷐:dialog False False True dialog False Defines a dialog box or window
﷐:dir False True False dir False Directory list
﷐:div False True True div False Generic block-level container
﷐:dl False True True dl False Definition list
﷐:dt False True True dt False Definition term
﷐:em False True True em False Emphasis
﷐:embed False False True embed False Defines a container for an external (non-HTML)...
﷐:fieldset False True True fieldset False Form control group
﷐:figcaption False False True figcaption False Defines a caption for a <figure> element
﷐:figure False False True figure False Specifies self-contained content
﷐:font False True False font False Font change
﷐:footer False False True footer False Defines a footer for a document or section
﷐:form False True True form False Interactive form
﷐:frame True True False frame False Frame
﷐:frameset False True False frameset False Frameset
﷐:h1 False True True h1 False Level-one heading
﷐:h2 False True True h2 False Level-two heading
﷐:h3 False True True h3 False Level-three heading
﷐:h4 False True True h4 False Level-four heading
﷐:h5 False True True h5 False Level-five heading
﷐:h6 False True True h6 False Level-six heading
﷐:head False True True head False Document head
﷐:header False False True header False Defines a header for a document or section
﷐:hr True True True hr True Horizontal rule
﷐:html False True True html False HTML document
﷐:i False True True i False Italic text
﷐:iframe False True True iframe False Inline frame
﷐:img True True True img True Inline image
﷐:input True True True input True Form input
﷐:ins False True True ins False Inserted text
﷐:isindex True True True isindex False Input prompt
﷐:kbd False True True kbd False Text to be input
﷐:keygen False False True keygen True Defines a key-pair generator field (for forms)
﷐:label False True True label False Form field label
﷐:legend False True True legend False Fieldset caption
﷐:li False True True li False List item
﷐:link True True True link True Document relationship
﷐:main False False True main False Specifies the main content of a document
﷐:map False True True map False Image map
﷐:mark False False True mark False Defines marked/highlighted text
﷐:menu False True True menu False Menu list
﷐:menuitem False False True menuitem False Defines a command/menu item that the user can ...
﷐:meta True True True meta True Metadata
﷐:meter False False True meter False Defines a scalar measurement within a known ra...
﷐:nav False False True nav False Defines navigation links
﷐:noframes False True False noframes False Frames alternate content
﷐:noscript False True True noscript False Alternate script content
﷐:object False True True object False Object
﷐:ol False True True ol False Ordered list
﷐:optgroup False True True optgroup False Option group
﷐:option False True True option False Menu option
﷐:output False False True output False Defines the result of a calculation
﷐:p False True True p False Paragraph
﷐:param True True True param True Object parameter
﷐:picture False False True picture False Defines a container for multiple image resources
﷐:pre False True True pre False Preformatted text
﷐:progress False False True progress False Represents the progress of a task
﷐:q False True True q False Short quotation
﷐:rp False False True rp False Defines what to show in browsers that do not s...
﷐:rt False False True rt False Defines an explanation/pronunciation of charac...
﷐:ruby False False True ruby False Defines a ruby annotation (for East Asian typo...
﷐:s False True True s False Strike-through text
﷐:samp False True True samp False Sample output
﷐:script False True True script False Client-side script
﷐:section False False True section False Defines a section in a document
﷐:select False True True select False Option selector
﷐:small False True True small False Small text
﷐:source True False True source True Defines multiple media resources for media ele...
﷐:span False True True span False Generic inline container
﷐:strike False True False strike False Strike-through text
﷐:strong False True True strong False Strong emphasis
﷐:style False True True style False Embedded style sheet
﷐:sub False True True sub False Subscript
﷐:summary False False True summary False Defines a visible heading for a <details> element
﷐:sup False True True sup False Superscript
﷐:table False True True table False Table
﷐:tbody False True True tbody False Table body
﷐:td False True True td False Table data cell
﷐:textarea False True True textarea False Multi-line text input
﷐:tfoot False True True tfoot False Table foot
﷐:th False True True th False Table header cell
﷐:thead False True True thead False Table head
﷐:time False False True time False Defines a date/time
﷐:title False True True title False Document title
﷐:tr False True True tr False Table row
﷐:track True False True track True Defines text tracks for media elements (<video...
﷐:tt False True False tt False Teletype text
﷐:u False True True u False Underlined text
﷐:ul False True True ul False Unordered list
﷐:var False True True var False Variable
﷐:video False False True video False Defines a video or movie
﷐:wbr True False True wbr True Defines a possible line-break

In [29]:
; include notebook custom styles
(IPython.display.HTML (.read (open "styles.css" "r")))


Out[29]:

The MIT License

Copyright (c) 2017 Marko Manninen